How well does file size predict wide-area transfer time?
نویسندگان
چکیده
In scheduling connections at busy web servers, it is commonly assumed that transmission duration (or time in system) is directly proportional to the size of the file transferred. For example, a scheduling discipline such as SRPT (shortest remaining processing time) could use this assumption to order connections according to the residual size of the transfer. However, with a diverse client population, network effects such as packet loss, heterogeneous end-to-end bandwidths and latencies render this assumption invalid. In this measurement study, we explore this relationship and investigate the predictive value of file size in determining transfer time. We use the publicly available sanitized cache access logs which are collected on a daily basis as a part of IRCache [16], the NLANR Web caching project, to explore this relationship for HTTP traffic serviced by the NLANR caches over a weeklong interval. Over this dataset, we first confirm an earlier finding: that for small transfers of up to 30KB, there is virtually no correlation between file size and transfer time; moreover, transfer times vary over 5 orders of magnitude. For larger files, we find that file size and transfer time are increasingly well correlated as file size increases, but we still find that predictions of transfer time from file size alone are not highly accurate. Our findings motivate further investigation of incorporating network-awareness into endsystem scheduling disciplines.
منابع مشابه
On Parameter Tuning of Data Transfer Protocol GridFTP in Wide-Area Grid Computing
In recent years, a wide-area Grid computing has got a lot of attention. In the wide-area Grid computing, by connecting computational resources distributed geographically via networks, the computational resources can be used efficiently and large-scale scientific and engineering computation become possible. In the wide-area Grid computing, a data transfer protocol called GridFTP is used for larg...
متن کاملJPARSS: A Java Parallel Network Package for Grid Computing
The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve bandwidth and to reduce latency on a high speed wide area network. This paper presents a Java package called JPARSS (Java Parallel Secure Stream (So...
متن کاملWeb Servers for Bulk File Transfer and Storage
GridSite has extended the industrystandard Apache webserver for use within Grid projects, both by adding support for Grid security credentials such as GSI and VOMS, and with the GridHTTP protocol for bulk file transfer via HTTP. We describe how GridHTTP combines the security model of X.509/HTTPS with the performance of Apache, in local and wide area bulk transfer applications. GridSite also su...
متن کاملServer-assisted Latency Management for Wide-area Distributed Systems
Recently many Internet services employ wide-area platforms to improve the end-user experience in the WAN. To maintain close control over their remote nodes, the wide-area systems require low-latency dissemination of new updates for system configurations, customer requirements, and task lists at runtime. However, we observe that existing data transfer systems focus on resource efficiency for ope...
متن کاملRedundant Parallel File Transfer with Anticipative Adjustment Mechanism in Data Grids
More and more applications emphasize analysis huge data and depend on the data transmission. Data Grids enable the selection, sharing, and connection of a wide variety of geographically distributed computational and storage resources for content the large-scale data-intensive application needs. Data grids consist of scattered computing and storage resources located in different countries/region...
متن کامل